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Transcriptional pulsing has been observed in both prokary- 
otes and eukaryotes and plays a crucial role in cell to cell 
variability of protein and mRIMA numbers. The issue is 
how the time constants associated with episodes of tran- 
scriptional bursting impact cellular mRIMA and protein dis- 
tributions and reciprocally, to what extent experimentally 
observed distributions can be attributed to transcriptional 
pulsing. We address these questions by investigating the 
exact time-dependent solution of the Master equation for 
a transcriptional pulsing model of mRNA distributions. We 
find a plethora of results: we show that, among others, bi 
modal and long-tailed (power law) distributions occur in 
the steady state as the rate constants are varied over bi- 
ologically significant time scales. Since steady state dis- 
tributions may not be reached experimentally we present 
results for the time evolution of the distributions. Because 
cellular behavior is essentially determined by proteins, we 
investigate the effect of the different mRNA distributions 
on the corresponding protein distributions. We delineate 
the regimes of rate constants for which the protein distribu- 
tion mimics the mRIMA distribution and those for which the 
protein distribution deviates significantly from the mRNA 
distribution. 

stochastic gene expression | transcriptional pulsing | mRNA distribu- 
tion | protein distribution | bimodal distribution 



Introduction 

Cell to cell variability in mRNA and protein num- 
bers is now recognized as a major aspect of cellular 
response to stimuli, a variability which is hidden in 
cell population studies. The most egregious exam- 
ple of the latter is provided in cases where a graded 
average response hides the all-or-nothing behavior of 



single cells [US [3]. Variability of cellular response 
can have many origins, which are generally classi- 
fied as extrinsic and intrinsic noise or fluctuations [4]. 
The source of intrinsic fluctuations is the random oc- 
currence of reactions that can lead to variability for 
genetically identical cells in identical, fixed environ- 
ments. Extrinsic fluctuations can have multiple ori- 
gins, such as variations from cell to cell in the number 
of regulatory molecules, or signaling cascade compo- 
nents, or fluctuations in cytoplasmic and nuclear vol- 
umes. Many studies, both experimental and theo- 
retical from bacteria to eukaryotes have been under- 
taken to disentangle intrinsic and extrinsic fluctua- 
tions si n © mm no nam- 

Intrinsic fluctuations arise from either noisy tran- 
scription or translation or both, the effects of which 
can be measured in single cell mRNA and protein 
experiments. The simplest model of protein number 
distributions is to consider both transcription and 
translation as Poisson processes [7\. Recent exper- 
imental studies of mRNA distributions have shown 
strong evidence for transcriptional noise beyond what 
can be described by a simple Poisson process. In 
particular, transcriptional pulsing, where bursts of 
transcription alternate with quiescent periods, has 
been observed in both prokaryotes and eukaryotes. 
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Raser and O'Shea [5], who studied intrinsic and ex- 
trinsic noise in Saccharomyces cerevisiae, showed that 
the noise associated with a particular promoter could 
be explained in a transcriptional pulsing model and 
confirmed it by mutational analysis. Transcriptional 
bursts were recorded in E. coli |12j by following 
mRNA production in time, and their statistics com- 
puted. Evidence for a pulsing model of transcription, 
obtained from fluorescent microscopy, has also been 
presented for the expression of the discoidin la gene 
of Dictyostelvum [13] ■ Transcriptional bursts have 
as well been detected in Chinese hamster ovary cells 
[10] . In these experiments the production of mRNA 
occurs in a sequence of bursts of transcriptional activ- 
ity separated by quiescent periods. Transcriptional 
bursting, an intrinsically random phenomenon, thus 
becomes an important element to consider when eval- 
uating cell to cell variability. One can predict that in 
many cases it will be a significant part of overall noise, 
and most certainly of intrinsic noise. It has been spec- 
ulated that pulsatile mRNA production might permit 
"greater flexibility in transcriptional decisions" |13j . 
Cook et al. Q3] have argued that different aspects 
of haploinsufficiency can be connected to time scales 
associated with transcriptional bursting. 

Our study focusses on the consequences of tran- 
scriptional bursting in a simplified model of transcrip- 
tion that has been the subject of many studies and 
is believed to encapsulate the key features of burst- 
ing [2] El [10] [M [15] [16]. The complex phenomena 
that can occur in transcription (chromatin remodel- 
ing, enhanceosome formation, preinitiation assembly, 
etc.) are modeled through positing two states of gene 
activity: an inactive state where no transcription oc- 
curs, and an active one, in which transcription occurs 
according to a Poisson process. The production of 
mRNA is thus pulsatile: temporally there are peri- 
ods of inactivity interspersed with periods or bursts 
of transcriptional activity. Qualitative features of 
this model were presented in reference [2], and as- 



pects of it relating to bursts explored and discussed 
in reference [12] . Raj et al. [TO] provided a steady 
state solution to the Master Equation of the tran- 
scriptional model considered here, and analyzed it for 
some ranges of the rate constants. Given the range of 
time scales that can occur in transcriptional processes 
in different organisms, it is imperative to highlight 
the most significant behaviors that can arise in this 
model and investigate how these depend on the many 
time scales. In this paper, we provide a comprehen- 
sive analysis of a transcriptional pulsing model with 
an exact solution to the time- dependent Master Equa- 
tion for mRNA production. The advantage of this 
model is that it is amenable to such an analytic de- 
termination of the probability distribution of mRNA 
copy number as a function of time. We find that the 
system exhibits a surprising variety of distributions 
of mRNA number: this includes a bimodal distri- 
bution with power-law behavior between the peaks 
that evolves into a scale-invariant power-law distri- 
bution as we vary the rates of activation and inacti- 
vation. In some systems the mRNA distribution may 
not reach steady state, and it is therefore necessary 
to determine the time evolution of the distributions 
and characterize the time scales over which steady 
state is attained. Our time-dependent analytic solu- 
tion allows us to address these issues in detail, re- 
vealing in particular that the mRNA lifetime plays a 
key role in shaping the mRNA distribution. Cellular 
behavior is however determined by proteins and not 
the corresponding mRNA. Therefore, an important 
question is to what extent the protein distributions 
follow the mRNA distributions obtained as a result 
of transcriptional pulsing. To answer this question, 
we have performed numerical simulations of a model 
using the Gillespie algorithm [17] in which proteins 
are produced in a birth-death process from mRNA. 
When the protein decay rates are much larger than 
the mRNA decay rate the protein distributions re- 
flect the mRNA distributions; when the protein de- 



cays more slowly the protein distribution can be very 
different from that of the parent mRNA, even in the 
steady state. We delve deeply into the structure of 
the transcriptional bursting model, highlighting how 
the shapes of mRNA distributions depend on the ra- 
tios of time scales, determining over which time scales 
distributions evolve to steady state, and stressing the 
importance of considering the full distribution rather 
than characterizing it solely by its average and vari- 
ance. We thus provide an overview of possible be- 
haviors which yield a framework for interpreting ex- 
perimental results on transcriptional bursting across 
prokaryotes and eukaryotes. 

Results 

We study a model of transcriptional pulsing described 
by the following reactions where D and D* denote the 
gene in the inactive and active states respectively: 
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The first equation describes the switching "on" and 
"off" of the gene at the rates c/ and Cb respectively. 
The second and third equations describe transcrip- 
tion of the mRNA at a constant rate fc;, in the active 
state and the subsequent degradation of the mRNA 
at a rate kd. We present results for P(m, t), the prob- 
ability for the cell to contain m mRNA molecules at a 
time t that describes cell-to-cell variability of mRNA 
copy number as a function of time. 
Time scales. The steady state and temporal behav- 
iors of the mRNA distribution depend on the rate 
constants that control the time scales of various pro- 
cesses. Therefore, we begin by briefly summarizing 
the different time scales that arise in the model. The 
model has four rate constants, the forward and back- 
ward rates for the gene to switch between the ac- 
tive and inactive states and the transcription and 
degradation rates that govern mRNA numbers, lead- 



ing to three independent, dimensionless ratios. The 
equation obeyed by the probability for the DNA 
to be in the excited state, denoted by Qi(t) can 
be obtained directly from Equation (1): dQi/dt = 
Cf — [c.f +Cb) Qi ■ This shows that the effective DNA 
relaxation rate to the steady state is governed by 
Cf + Cb. The mean mRNA number obeys the equa- 
tion d{m(t))/dt — —kd{m) + kbQi(t). It is thus 
clear that the temporal behavior of the mean mRNA 
number is determined by the rates kd and c/ + Cb, 
the latter entering since it determines the dynam- 
ics of the transcriptionally active state. As long as 
kd < Cf + Cb, the mRNA decay rate sets the time 
scale over which relaxation to the steady state oc- 
curs. We find that the results of our exact solution 
can be interpreted in the most natural and trans- 
parent way when we measure time in units of k^ 1 , 
i.e. in terms of the mRNA lifetime. Thus we will 
use the three dimensionless ratios kb/kd, Cf/kd, and 
Cb/kd to organize our results. The mean number of 
mRNA in the steady state is given by the product of 
Cf/(cf + Cb), the fraction of the time the gene is in the 
activated state and kb/kd, the mean value of mRNA 
if the gene is always 'on'. The ratio kb/kd clearly sets 
the scale for the number of mRNA and increasing 
it extends the range over which P(m) is appreciable 
without a significant change of shape. The remain- 
ing ratios c/ /kd and Cb/kd determine the shape of the 
distribution. 

Superposition of Poisson distributions. We begin by 
providing an intuitively appealing way to view our 
exact result for P(m,t). The key conclusion is that 
P(m,t) can be pictured as a superposition of Poisson 
distributions with different mean values. If the gene 
is always "on" the mRNA distribution in steady state 
is Poisson with the Poisson parameter, A, given by the 
mean kb/kd, the ratio of transcription and degrada- 
tion rates. Since the gene flips between the "on" and 
"off" states with the rates determined by c/ and Cb, 
the mRNA distribution is determined by a stochas- 



tic transcription rate kt^(t) where £(£) is a dichoto- 
mous noise that assumes values or 1 corresponding 
to the gene being in the inactive or active state re- 
spectively. The dynamics of the random variable £ 
is determined by the stochastic chemical reaction de- 
scribed by Equations (1). Thus the distribution of 
the mRNA number is described by a Poisson process 
in which the parameter A itself is stochastic, a process 
called a doubly stochastic Poisson process [18] . 

Consider observing a particular cell at a time 
T. The number of mRNA at time T is distributed 
according to a Poisson distribution with parameter 
A(T) that depends on the time history of £(£) from 
to T describing the sequence of flips between the on 
and off states in that cell. This time history corre- 
sponds to a series of pulses of unit height with both 
the widths of the pulses and the intervals between 
pulses independently and exponentially distributed 
with parameters c/ and Ct respectively. Different cel- 
lular behaviors correspond to different realizations of 
the random sequence of pulses. Thus cell to cell vari- 
ability in mRNA copy number is given by a superpo- 
sition of Poisson distributions with parameter A that 
is itself a random variable: 

/\ro 
d\p(\,t)e- x — - [4] 

m! 

where p(A, t) is the probability density of the random 
variable A. The fraction of cells with m copies of 
mRNA at time t is determined by p(A,i). Such su- 
perpositions have been considered in the context of 
stochastic processes, for example, in |19| . This rep- 
resentation provides an attractive conceptual frame- 
work for understanding the transcriptional pulsing 
problem. We shall elaborate elsewhere on how the 
different forms of p(A, t) in the different regions of pa- 
rameter space allow us to interpret the corresponding 
behaviors of P(m,t). 

Steady state distributions. We now describe the vari- 
ety of steady state distributions that occur in differ- 
ent regions of parameter space. In view of the discus- 
sion of time scales in the previous section, it is natu- 



ral to classify the distributions by plotting Cf/kd and 
Cb/kd along the x- and y-axes for fixed kb/kd- The re- 
sults for fixed value kb/ kd — 100 are displayed in Fig- 
ure [T] and provide a bird's eye view of the strikingly 
different mRNA distributions that arise in different 
regions of parameter space. We recall that the exper- 
iments are performed on a variety of organisms both 
prokaryotic and eukaryotic. While rate constants are 
not known, given the different time scales involved 
in the experiments, we have chosen to investigate a 
range of values of c/ /kd and Cb/kd that encompass 
different biologically significant cases: for example, 
our choices include the vastly different rate constant 
values in the experiments of Raser and O'Shea [5] and 
Raj et al. [TO] . 

We start with the interesting case displayed in the 
bottom left figure in Figure [l] when the mRNA half- 
life is shorter than the the time scales over which the 
gene turns on or off, i.e., Cf,Cb < kd- In the steady 
state at any given time, the gene is off in some cells. 
Since the mean duration of the pulse 1/q, > 1/kd 
the transcripts produced in the previous occurrence 
of the on state would probably have decayed and so 
the number of transcripts will usually be small in 
these cells. This causes a peak in the mRNA distri- 
bution near m = 0. In those cells in which the gene 
is on at the time of observation the number of tran- 
scripts can display a broad range of values depend- 
ing on how long the gene was active as compared to 
the mRNA lifetime. Thus, we expect to observe a bi- 
modal distribution, as was qualitatively argued in [2]. 
The result is shown in the lower left quadrant of Fig- 
ure [l] One finds a peak at m = (with a weight that 
can be computed analytically) and another peak at 
large (~ kb/kd) m values. If the values of c/ and C;, 
are such that the peaks are well-separated, much of 
the intermediate region displays a power-law behav- 
ior. This reflects the broad range of times for which 
the gene has been active in different cells at the time 
of observation. It is useful to remark that bimodal 



distributions have been obtained in models with feed- 
back [20]. In contrast, in the transcriptional pulsing 
model bimodality is obtained without the presence of 
a feedback loop. 

Now imagine that we keep c/ fixed and vary ct 
so that it is larger than the decay rate. This leads to 
a scale-invariant, power-law behavior over a signifi- 
cant range of mRNA values. This simple power-law 
decay obtained in the case Cf < kd and Cb > fed is 
illustrated in the lower right quadrant of Figure Q] 
This case has been treated analytically in [151 110] . 
The mRNA distribution can be fitted by a Gamma 
distribution, which for appropriate values of the rate 
constants, shows power-law behavior over a substan- 
tial range. 

When both the activation and inactivation rates 
are rapid, i.e., Cf,Cb 3> kd, eliminating the fast re- 
actions naively yields a simple birth-death process 
for the mRNA with an effective transcription rate 
kbXCf/(cf+Cb). This would lead one to expect a Pois- 
son distribution for the mRNA number. However, in 
this 'quadrant', i.e. for Cf,Cb > kd, the observed 
distribution has a broad single-humped shape as dis- 
played in the upper right quadrant of Figure [l] much 
broader than a Poisson distribution. This broaden- 
ing occurs because the parameter A itself is stochas- 
tic. When Cf > kd > Cb the gene is on most of the 
time. Not surprisingly, the distribution is Poisson 
to a very good approximation as seen in the upper 
left quadrant of Figure [l] In the intermediate re- 
gion when Cf ,Cb ~ kd the distribution interpolates 
between these different possibilities and is rectangu- 
lar when they are equal (see Figure [l] center). 

Time evolution of probability distributions. Because 
of the range of possible time scales, it can happen 
that that the time when measurements are made, 
the biological system has not attained steady state. 
For this reason, we now present results for how the 
distributions evolve to a steady state from an ini- 



tial state with no mRNA and the gene in its inactive 
state. Using the time-dependent result for the distri- 
bution (see Material and Methods, equation (9)), we 
evaluate the evolution using Mathematica and plot 
the complete probability distribution as a function of 
time. Consider the case Cf,Cb < kd, (bottom left in 
Figure [T]), where the mRNA distribution displays bi- 
modality. Here the mRNA decay rate sets the scale 
for approach to the steady state. Figure [5Ja) shows 
the evolution of the bimodal distribution as a func- 
tion of time. For the given initial condition the sec- 
ond peak away from zero develops after a period of 
roughly twice the mRNA half-life. Steady state be- 
havior sets in at about 4 to 5 times k^ 1 . It is clearly 
possible that, depending on the relative values of the 
cell cycle time and the mRNA half-life, steady state 
and therefore, full bimodality may not be observable. 
Consider now the time evolution of the distribu- 
tion that evolves into the "pure" power law behavior 
featured in the bottom right of Figure Q] In Fig- 
ure[2jb) we plot P(m, t) vsmona double logarithmic 
plot. We have chosen c/ = 0.25kd and Cb — 2.5kd to 
illustrate this case. Larger values of the transcription 
rate will lead to a larger range over which the power 
law behavior obtains. It is clear that the exponent of 
the power law increases in magnitude with time and 
saturates at the steady state value for t greater than 
about 4fc d " 1 . Thus the shape the distribution depends 
crucially on the time (measured in units of the decay 
time) when experimental measurements are made. 

Mean and variance versus full distribution. From the 
examples given in Figure 1 it is clear that the com- 
plete probability distribution of mRNA number is re- 
quired to characterize the behavior of the transcrip- 
tional pulsing model. Nevertheless, for completeness, 
we make some remarks concerning attempts to repre- 
sent a mRNA distribution by its mean and variance 
only. 

We recall the expressions for the mean and vari- 
ance in mRNA number in the transcriptional puis- 



ing model reported in [5]. The mean is given by 
/i = kbCf /(kd(cf + cj)) and the variance by 



2 _ K b c f 



kd Cf + c b 



1 + 



k b c b 



(Cf +c b )(k d + c f + c b ) 



The first term in (5) is equal to the mean while the 
second term arises from the stochasticity in the puls- 
ing process. It can be shown [IS] that 

* 2 = (\)+al [6] 

Thus, in a doubly stochastic birth-death process the 
variance in mRNA number has an additional contri- 
bution due to the stochasticity of gene activation and 
inactivation. 

There are two popular measures of noise in terms 
of the first two moments of a probability distribution: 
the coefficient of noise, £, defined as the ratio of the 
standard deviation a to the mean \x, and the noise 
strength or Fano factor, r\, defined as the ratio of the 
variance a 2 to the mean . The latter has the value of 
unity for a Poisson distribution and is therefore con- 
venient for describing deviations from Poisson behav- 
ior. In Figure [3] we display constant rj contours as a 
function of c/ /kd and c b /kd for a fixed value of k b /kd 
on a logarithmic scale to encompass a broad range of 
parameter variation. When c/ < kd and c b > kd, the 
steady state distribution P(m) is monotonically de- 
creasing and has a power law region. In this region, 
to a first approximation r\ is independent of c/ (and 
~ 1 + k b /(kd + Cb)) and the contours are roughly par- 
allel to the Cf axis. This emphasizes the possibility 
that a 2 /[i is a constant for systems with power-law 
behavior in which c/ varies over a broad range of val- 
ues. Since as we show later, the protein distribution 
can reflect the behavior of the corresponding mRNA 
distribution, the protein distribution can show a simi- 
lar constancy of the Fano factor. Such a behavior has 
been observed experimentally in [2T] where a pulsing 
model was discussed. In the region where Cf,c b < kd, 
then r\ ~ 1 + (k b /kd)/(l + Cf/cb) and thus depends 
only on Cf/c b . This is consistent with the contours in 



this region being straight lines with slope 1. For the 
region with Cf,c b > kd, there is rapid switching be- 
tween on and off states and the Fano factor depends 
weakly on the rates c/ and c b . 

There is danger in characterizing distributions 
solely by their mean and variance. The variety of 
possible mRNA distributions across cells shown in 
Figure [l] demonstrates this. One of the interest- 
ing results in Reference [5] showed the decrease in 
the noise strength (Fano Factor) with increase in the 
mean for genes with different activation rates. Here 
we show that a wide variety of distributions under- 
lies this correlation between the noise strength and 
the mean. The increase in the mean can be ob- 
tained in the model through an increase in the ac- 
tivating rate, namely the rate c/, and experimentally 
through mutations of an appropriate promoter [5]. 
Even though a smooth curve is obtained for the de- 
crease of noise strength with the mean, we illustrate 
how the full mRNA distribution can differ for differ- 
ent points along the curve. For specificity, we choose 
parameter values k b = 200fcd and c b = kd, and vary 
the forward rate c/ for gene activation which changes 
the mean value. The result is shown in Figure |4ja) 
and is similar to that obtained experimentally. Now 
we examine the full probability distribution at three 
values oi Cf, namely c/ = O.lfcd, kd, and lOfcd, which 
correspond to mean values of 18, 100, and 181 re- 
spectively. The mRNA distributions are shown in 
Figure |4jb): the distribution ranges from power-law 
decay of P(m) for c/ = O.lkd to a broadened Pois- 
son distribution for cj = lOfcd- Furthermore, as we 
pointed out earlier, the value of mRNA degradation 
rate plays an important role in determining the type 
of mRNA distribution, a role not apparent in the 
regimes discussed in [5]. 

Since we have an analytic expression for the com- 
plete distribution we can use an information-theoretic 
characterization of the mRNA probability distribu- 
tion, the Shannon entropy. We evaluate the Shan- 



non entropy for different values of the rate constants. 
While the flat distribution clearly corresponds to high 
entropy, the power-law distribution also yields large 
values of the entropy indicating that greater informa- 
tion content than the other distributions. The results 
are presented in Supplementary Section B. 

Protein distributions. Given the variety of mRNA 
distributions that can result from genes undergoing 
transcriptional pulsing, it is important to understand 
how this affects the probability distributions for the 
corresponding protein. While a careful answer to this 
question would require detailed modeling of mRNA 
translocation and translation, we address this issue in 
a model in which translation is treated as a Poisson 
process [7J. Thus, we use 
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The effective protein degradation rate would include 
contributions from cell division, dimerization and 
other gene specific processes involving the loss of pro- 
teins. From our results we identify two regimes, one 
in which the protein distribution is similar to the 
mRNA distribution and another in which they are 
different. 

The protein distribution qualitatively mirrors the 
mRNA distribution if the protein dynamics is faster 
than the mRNA dynamics: translation and protein 
degradation occur at a rate higher than the mRNA 
degradation rate kd (with cj + C\, of the order of kd)- 
In this case, bimodals give rise to bimodals, power- 
laws give rise to power-laws and so on. The results 
are displayed in Figure [5] where the protein distribu- 
tions for two of the cases illustrated in Figure [T] are 
shown along with the corresponding mRNA distribu- 
tions. We show results for the protein degradation 
rate pd set to twice kd, the mRNA degradation rate. 
Figure [SJa) has rate constants c/ , c& such that the 
mRNA distribution exhibits bimodality, as seen in 
the inset. The protein distribution is also clearly bi- 



modal for this case. Similarly, in Figure [SJb), a long 
tailed mRNA distribution with a power law region 
gives rise to a similar protein distribution when the 
other rate constants are appropriately chosen. 

The other important case is when the proteins 
are relatively stable compared to the mRNA. This 
case is more complex; however, there are ranges of 
rate constants for which the protein and mRNA dis- 
tributions are vastly different. We illustrate this with 
two examples. If the rate constants are chosen such 
that the mRNA distribution is bimodal (cf. Figure 
1), then P(p) is qualitatively different from P(m) 
if Pd < Cf + Cb. The protein distributions for this 
case may be either monotonically decreasing or bell- 
shaped, depending on whether pd is less than one or 
both of Cf,Cb- This contrast between the protein and 
the mRNA distributions is illustrated in Figure |6{a). 
A second case is displayed in Figure 6(b) for values of 
the rate constants that lead to a power-law distribu- 
tion for the mRNA. Here a bell-shaped distribution is 
obtained for P(p) when pd « Cf < kd < Q,. These 
examples demonstrate that one has to be careful in 
inferring the shape of one of the mRNA or protein 
distributions from the other. 

It is worth noting that even if mRNA numbers 
are small, even as low as O(10), the above conclusions 
continue to hold. Thus, a wide variety of protein dis- 
tributions that may or may not reflect the underlying 
mRNA distribution could be realized in real biologi- 
cal systems when the gene undergoes transcriptional 
pulsing. 

It has been argued recently [22] that if the ef- 
fective protein degradation is very slow compared to 
that of the mRNA, the protein distribution can be ap- 
proximated by a gamma distribution. While gamma 
distributions do provide a good fit for some regions of 
parameter space, none of the distributions obtained 
in the bimodal quadrant can be reasonably approxi- 
mated by a gamma distribution. 



Discussion 

In this paper, we have presented results for the time- 
dependent and steady-state probability distributions 
for mRNA in a transcriptional pulsing model. A vari- 
ety of mRNA distributions occur in different regimes 
of rate constants. Our aim is to provide a guide for 
the interpretation of data on cell-to-cell variability 
that could arise from transcriptional pulsing. Tran- 
scriptional pulsing, entailed by the dynamics of chro- 
matin remodeling, reinitiation and similar processes, 
appears as a straightforward mechanism leading to 
bimodality and also to mRNA distributions with long 
tails. Long-tailed distributions of mRNA have been 
seen in a variety of systems: the experiments of Raser 
and O'Shea [5] show evidence for long tails which 
they attribute to transcriptional pulsing. In experi- 
ments on the gene ActB in cells from mouse pancre- 
atic islets the distribution of mRNA number, m, was 
found to be consistent with a log-normal distribution 
that would correspond to P(m) ~ mT 1 over some 
range of m [23] . More recently, a similar distribution 
of the Interferon-/? gene transcripts has been found 
in human dendritic cells [TT]. For the latter two ex- 
periments our results should help clarify whether the 
origin of the mRNA behavior lies in transcriptional 
pulsing. However, cell behavior is controlled not by 
mRNA but by the proteins they encode. Therefore, 
it is crucial to determine whether the protein distri- 
bution follows the corresponding mRNA distribution. 
We have determined when the two distributions are 
similar but also identified situations when the pro- 
tein distributions are strikingly different from those 
of the mRNA. In general, and for these situations in 
particular, our results on the range of cell-to-cell vari- 
ability of mRNA and protein responses due to tran- 
scriptional pulsing should provide significant help in 
interpreting experiments. 



Materials and Methods 

The results presented and discussed here for mRNA 
are based on the exact form of the distribution func- 
tion P(m,t) of the mRNA number, m at time t. We 
have solved the Master Equation [19] for reactions 
(l)-(3) that describes the time evolution of the dis- 
tribution to obtain these results. We found it con- 
venient to work with the generating function defined 
by G(z,t) = Y.Z=o z m P(m,t). If we can evalu- 
ate G(z, t) exactly, then the probability of having 
m mRNA transcripts at time t can be obtained by 
extracting the coefficient of the z m term. We have 
computed the generating function exactly (See Sup- 
plementary Section A for the details) for the initial 
condition with zero mRNA, i.e., P(m, 0) = 5 m ,o, and 
find 

G(z, t) = F a (t)$(c f ,Cf + c b ; -k b (l - z)) 
+ F ns (i)(l - z)$(l - C b , 2 - c S - c b ; -k b {l - *)) 

[9] 
where $ is the (Kummer) confluent hypergeometric 
function [241125] and the coefficients F 3 (t) and F ns (t) 
can be calculated explicitly in terms of confluent hy- 
pergeometric functions. The results are displayed in 
Supplementary Section A. At large times F s — 1 and 
F ns = 0. Thus in the steady state the generating 
function is given by( see also [10] ) 



G s (z) = $(cf,Cf + Cb;-k b {l-z)), 
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We can use the exact solution in Equation (9) 
to extract the time- dependent behavior of P(m,t) or 
Equation (10) for the steady state for different ranges 
of values of the rate constants. The results presented 
were obtained by extracting the coefficients of z m in 
the expansion of the generating function using Math- 
ematical]. While standard numerical simulations 
based on the Gillespie algorithm [17] can be employed 
to study both the steady state and the time evolution, 
the exact solution allows us to extract the results 
much more efficiently and explore the behavior of the 
system systematically in the space of rate constants 



without statistical errors, especially, when there are 
long tails present. The results for the protein dis- 
tributions were obtained from numerical simulations 
using the Gillespie algorithm. 
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Fig. 1. Steady state mRNA distributions P{ra) vs. m, labeled by {ct , c&) in units of k^. The mRNA transcription rate k^ is lOOA;^ for all the 
distributions. The figure shows prototype distributions for the five major regimes Cf, Ct, ^ k^ in parameter space. The distribution for Cf,Cb = k^ 
is flat. Bimodals are obtained in the lower left panel for Cf^c^ < k^ while a power law occurs in the lower right panel. 
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Fig. 2. Time evolution of P(m, t) towards steady state as a function of m at different time points (a) in the bimodal regime for Cf = 0.7bkd, 
Cfc = 0.5fcrf and kf, = lOOk^ at times t = 0.5, 1, 2, 3, 4 and oo in units of k^ . The steady state with bimodality is reached around 4fc^ 
(b) Log-Log plot of P(m, t) for Cy = 0.25fc^, c^ = 2.5/c^ and fc& = lOOfe^ at times t = 1, 2, 3, 4 and oo in units of &7 . The figure shows 
the time evolution of the power law region of P(m, t). For the different curves time increases from bottom to top with the slope increasing to its 
steady-state value. 
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Fig. 3. Contour plot of the noise strength(Fano factor) 7/ as c^ and Cf (in units of kd) are varied, for kfr = lOOO/c^. in the steady state; Cf, and 
Cf are varied on a log^Q scale over 5 decades. 9 contours for different values of 77 are placed at intervals of 100, from 1 to 1001 with T) increasing 
from light to dark values. 
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Fig. 4. Smoothly varying noise strengths as activation rate is varied can correspond to different probability distributions, (a) Variation of noise 
strength (Fano factor) with activation rate for k\y = 200k c [ and c^ = k^ (b) The steady state distributions P(TTl) corresponding to the (diamond- 
shaped) points marked in (a). The points in (a) going from right to left and the corresponding figures from top to bottom in (b) are for Cf /k^ = 10, 1, 
and 0.1 respectively. Figure (b) illustrates how different points on the same curve (a) can be associated with dramatically different mRNA distributions. 
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Fig. 5. Examples of steady state distributions of proteins reflecting those of mRNA. Protein distributions (a) for Cf = O.bkd, C-b — O.Skd and 
k b = lOOfcrf, pi, = 20kd and pd = 2kd (b) for Cf = O.bk^, c^ = 2kd and fej, = lOOk^, pi, = 20kd and pd = 2k^. The insets show the 
corresponding mRNA distributions. When the protein degrades faster than the mRNA, its distribution qualitatively mirrors the corresponding mRNA 

distribution. 
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Fig. 6. Examples of steady state distribution of proteins differing from those of mRNA. Protein distributions for (a) for Cf = OSkd, C\, = 0.2kd 
and kf, = 100kd, pb — l&d and pd = 0.05kd (b) for Cf = O.bkd, c& = 2kd and kb = lOOkd, Pb — l&d- The protein lifetime is 20 times 
longer than the mRNA lifetime. The insets show the corresponding mRNA distributions. When the protein degrades slowly compared to the mRNA, its 
distribution can be qualitatively different from the corresponding mRNA distribution. 
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Supplementary Section A 
Time-dependent solution to the Master Equation for transcriptional bursts 

For the set of reactions described by Equations 1-3 in the text we define Po(m,t) and 
Pi{m, t) to be the probability that at time t the cell has m mRNA molecules and the gene is 
in the inactive and active states respectively. It is straightforward to write down the Master 
Equation for the two probabilities: 

°^ ' = -c f P (m,t) +c 6 Pi(m,t) + k d [(m + l)Po(m + l,t)-mP {m,t)] (1) 

to = c f p o(m,t) - c b Pi(m,t) + fe d [(m + l)Pi(m + l,t)-mi'i(m,t)] 

<N '■ +k b [P 1 (m-l,t) - Pi(m,t)] (2) 

> 

O ■ We define the generating functions 

|>; G a (z,t) = J2 z m P a (m,t) 

m=0 

for a = and 1. The mRNA distribution (independent of the state of the gene) is determined 
r\)- by the sum G = Gq + G\. It is easy to deduce the equations obeyed by the generating 

q ■ functions from the Master Equations (with time re-scaled by kj): 

^?| d t G (z,t) = -c f G (z,t) + c b G 1 (z,t) + (l-z)d z Go(z,t) (3) 

d t Gx(z,t) = c f G (z,t) - Cbd&t) + (l-z)d z G 1 (z,t) - k b (l-z)G 1 (z,t). (4) 



> 



All the rate constants are measured in units of kj. 

We simplify the equations using an analog of the Galilean transformation by making the 
change of variables v = k b (l — z) and w = ve~ l = fc&(l — z) e _i . In terms of the transformed 
variables, we have 



^ ; vd v G = -CfG + c b Gi (5) 

vd v G\ = cjGq — c b Gi — vG\ . (6) 

/\ ' Adding the two equations we have the useful relation 

d v {G + G 1 ) = -G x . (7) 

Note that Gq(z, t) and G\(z, t) (and hence, their sum) are functions of v only and independent 
of w = k b (l — z)e~ t ; the dependence on w is determined by the boundary conditions. 

It is convenient to derive a second-order differential equation for G. Therefore we differ- 
entiate the equations for Go and G\ and obtian ([7]) 

vd%G + (1 + c f + c b + v)d v G + c f Go = (8) 

vdld + (1 + c f + c b + v)d v G 1 + (1 + C/ )Gi = . (9) 

1 



We add the two equations and use Equation ([7]) to obtain 

vd 2 v G + (c f + c b + v)d v G + c f G = . 

The substitution G(v) = e~ v F{v) shows that F(v) satisfies the confluent hypergeometric 
equation in the canonical form. The solution is given by 

F = A{w) $(c b , c f + c b ; v) + B (w) v 1 -^-^ $(1 - C/ , 2 - c f - c b ; v) . (10) 

Upon using the Kummer transformation, e~ v ^(a,'y; v) = $(7 — 0,7; —v), we obtain 

G = A(w)$(c f ,c f + c h ; -v) + B (u;)w 1 ~^~ C6 $(l - c b , 2 - c f - c b ; -v) . (11) 

In order to obtain a well-defined power series in v = k b (l — z) for the generating function we 
must impose 

B (w) = w c f +Cb B(w) = v c f +Cb e~ {c f +Cb)t B(w) . 

This yields the form 

G = A(w) $(c f ,c f + c b ; -v) + B(w) e -^f +Cb ^ v$(l -c b ,2-c f - c b ; -v) . (12) 

We impose the boundary conditions at t = which corresponds to w = v. The initial 
condition P(m, t = 0) = 8 m fl leads to 

G(w = v,v) = 1. (13) 

We assume that the gene is initially in the inactive state and thus Gi(z,t = 0) = 0. The 
additional condition that arises from Equation (|7j) implies 

d v G(w,v)\ w=v = 0. (14) 

Imposing these conditions we determine the unknown functions A and Bq. This involves 
judicious use of the Wronskian identity 

$(a-7 + l,l-7;z)$(a,7;z) -z$(a-j + 1, 1 -7; z)$(a + 1,7 + 1, z) = e z (15) 

7(1-7) 

that follows from from results in Ref. [1] and other identities to found there. The final result 

is 

G(z,t) = F s (t)$(c f ,c f +c b ;-k b {l-z)) + F ns (t)(l-z)$(l-c b ,2-c f -c b ;-k b (l-z)). (16) 

where 

F s (t) = $(-c / ,l-c / -c fe ;/c fe e- t (l-z)) and (17) 

F ns (t) = — t Cfk ^~ Z) re-^^l + cj + c^^l-^)). (18) 

[Cf + c b ){l - Cf - c b ) 

2 
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Figure 1: Contour Plot of the Shannon entropy defined in Equation [19] of P(m) for kf, = 
WOkd', separation between contours is 0.05 and the Shannon entropy decreases outward from 
the central contour. 



Supplementary Section B 
Characterization of noise in terms of Shannon Entropy 

In information theory the Shannon entropy serves as a measure of the average information 
content or the uncertainty of a random variable. We now present results for this measure of 
the noise in the mRNA distribution. Given the exact solution we can directly evaluate the 
Shannon entropy associated with the steady state distribution P(m) defined by 



S = — y P(m) log 2 P(m) 






(19) 



In Figure 1 we display contours of constant entropy as a function of the forward and 
backward rates Cf and c&. Not surprisingly, the values Cf /kd,Cb/kd ~ 1 yield the largest en- 
tropy since this choice leads to a uniform distribution in P(m). The power law distribution 
also provides a range of values for the rates in which larger values of entropy can be obtained. 
Since the Shannon entropy is a measure of the amount of information required to describe the 
random variable on average it can be helpful in the interpretation of data. For example, con- 
sider dendritic cells involved in providing innate immunity to an organism against pathogens. 
Assume that the mRNA or the protein produced by the cell to overcome a viral antagonist 
has a broad distribution. It is plausible that the greater the amount of information required 
to describe the distribution the lesser the chances of the pathogen being able to overcome the 



organism's immune system by random mutations. This can confer greater immunity against 
mutations in the virus that could evade the defence mechanisms of the organism. It is known 
that the interferon-/? mRNA distribution is broad in human dendritic cells [2]. 
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